from pyprojroot import hereDeveloping the employment heatmap visualization
Current Canadian sentiment is at a low, with high cost-of-living, global political instability, and sweeping layoffs across multiple sectors. For the 2025 plotnine contest, I wanted to explore current official Canadian labour statistics using plotnine, a data visualization library in python.
Introduction
I am so happy that plotnine exists, which is a relatively new python data visualization package. plotnine is based on ggplot2, an R package that I have been using for almost a decade.
In this tutorial, I’ll walk through the process of creating my plotnine 2025 contest submission. The plot shows employment across Canadian industries, ranked by their percent change in monthly employment. To help visualize data across different industries, industry-specific plots are laid out in a “pseudo” interactive manner.
Setup
Data
The data can be downloaded using this bash script, or directly from StatCan’s website.
Parameters
In this initial code chunk we initialize some paramters that, later if needed, we can rerun this entire notebook with different paramters (e.g. different years).
pyprojroot is similar to R’s package here, which lets us construct filepaths relative to the project root. This is very convenient especially for quarto projects with complex file organization.
LABOUR_DATA_FILE = here() / "data" / "14100355.csv"
FIGURE_THEME_SIZE = (8, 6)
FILTER_YEAR = (2018, 2025)Libraries
# Data manipulation
import polars as pl
import polars.selectors as cs
# Visualization
from plotnine import *
# Mizani helps customize the text and breaks on axes
from mizani.bounds import squish
import mizani.labels as ml
import mizani.breaks as mb
import textwrap # for wrapping long lines of text
# Custom extract and transform functions for plot data
from labourcan.data_processing import read_labourcan, calculate_centered_rankRead and process data for graphing
The visualization required a fair amount of data processing which is detailed in this page. The steps are summarized here:
read_labourcan returns a polars.Data.Frame with:
- Unused columns removed
- Filtered to seasonally adjusted estimates only
- Filtered to Canada level estimates
- Additional
YEAR,MONTH, andDATE_YMDcolumns extracted fromREF_DATE - Sorted chronologically by year and month
See labour.qmd for details on data processing.
labour = read_labourcan(LABOUR_DATA_FILE)
labour_processed = calculate_centered_rank(labour)A first attempt
The type of visual that’s being developed here is something like a heatmap of employment numbers.
We want a clean separation of industries that are growing or shrinking. For that we are using a rank ordering by % monthly changed. But not just any ordinary rank, we center it around 0 such that sectors that are growing (% change > 0) have a positive rank and those that are shrinking are negative.
scale_color_gradient2 is a great option because it allows specification of our midpoint=0
(
ggplot(
(
labour_processed.filter(
pl.col("YEAR") >= FILTER_YEAR[0], pl.col("YEAR") <= FILTER_YEAR[1]
)
),
aes(x="DATE_YMD", y="centered_rank_across_industry", color="PDIFF"),
)
+ geom_point(shape="s")
+ theme_tufte()
+ theme(figure_size=FIGURE_THEME_SIZE, axis_text_x=element_text(angle=90))
+ scale_color_gradient2(
limits=(-0.01, 0.01), low="#ff0000ff", high="#0000dbff", midpoint=0, oob=squish
)
)geom_point or geom_tile
The whitespace between each point is distracting. I could make the point size larger, but the ratio of point size to range of the x and y axis, as well as the figure size all will ultimately determine how much whitespace remains between each point.
If we use geom_tile instead, which will plot rectangles specified by a center point, we can explicitly control the whitespace between tiles.
(
ggplot(
(
labour_processed.filter(
pl.col("YEAR") >= FILTER_YEAR[0], pl.col("YEAR") <= FILTER_YEAR[1]
)
),
aes(x="DATE_YMD", y="centered_rank_across_industry", fill="PDIFF"),
)
1 + geom_tile(height=0.95, width=30 * 0.95)
+ theme_tufte()
+ theme(figure_size=FIGURE_THEME_SIZE, axis_text_x=element_text(angle=90))
+ scale_fill_gradient2(
limits=(-0.01, 0.01), low="#ff0000ff", high="#0000dbff", midpoint=0, oob=squish
)
)- 1
-
I added
height = 0.95to add some whitespace between tiles vertically. To remove horizontal whitespace, we need to specify awidth. Because we are using adatetimeaxis, we need to specify it in unit of days. But each tile here is a month, so we need to express in units of 30 hence:width = 30*0.95.
Explicit color mapping with scale_color_manual
I am fairly happy with the scale_fill_gradient2 used with squish. We get a really nice palette that’s centered around 0. However scale_fill_gradient2 is limited to 3 colors (high, midpoint, low), which is not quite enable the more dynamic color palette that I’m seeking.
To be more explicit with the colors, I will bin the % change variable and then map each bin to a color manually using scale_fill_manual.
Bin with polars.Series.cut
labour_processed_cutted = (
labour_processed.with_columns(
pl.col("PDIFF")
.cut(
[
-0.05,
-0.025,
-0.012,
-0.0080,
-0.0040,
0,
0.0040,
0.0080,
0.012,
0.025,
0.05,
]
)
.alias("PDIFF_BINNED")
)
.with_columns(
pl.when(pl.col("PDIFF") == 0)
.then(pl.lit("0"))
.otherwise(pl.col("PDIFF_BINNED"))
.alias("PDIFF_BINNED")
)
.sort("PDIFF")
.with_columns(pl.col("PDIFF_BINNED"))
)
labour_processed_cutted.group_by("PDIFF_BINNED").len()| PDIFF_BINNED | len |
|---|---|
| cat | u32 |
| null | 21 |
| "(0.004, 0.008]" | 1736 |
| "(0.012, 0.025]" | 1292 |
| "(-0.012, -0.008]" | 717 |
| "(-0.025, -0.012]" | 892 |
| … | … |
| "(0.025, 0.05]" | 315 |
| "(-inf, -0.05]" | 47 |
| "(-0.004, 0]" | 1999 |
| "(0.05, inf]" | 58 |
| "(-0.008, -0.004]" | 1201 |
(
ggplot(
(
labour_processed_cutted.filter(
pl.col("YEAR") >= FILTER_YEAR[0], pl.col("YEAR") <= FILTER_YEAR[1]
)
),
aes(x="DATE_YMD", y="centered_rank_across_industry", fill="PDIFF_BINNED"),
)
+ geom_tile(height=0.95) # whitespace between tiles, vertically
+ theme_tufte()
+ theme(figure_size=FIGURE_THEME_SIZE, axis_text_x=element_text(angle=90))
)scale_fill_manual for explicit color mapping
Now we need to order the levels, and map to a specific color palette.
We will make PDIFF=0% (no change) to be gray, positive values to have green and blue colors (growth = good), and negative values to be red and orange (contraction = bad) colors.
order = (
labour_processed_cutted.drop_nulls()
.sort("PDIFF")
.select(pl.col("PDIFF_BINNED"))
.unique(maintain_order=True)
.to_series()
.to_list()
)
labour_processed_cutted_ordered = labour_processed_cutted.with_columns(
pl.col("PDIFF_BINNED").cast(pl.Enum(order))
)
color_mapping = {
"(-inf, -0.05]": "#d82828ff",
"(-0.05, -0.025]": "#fa6f1fff",
"(-0.025, -0.012]": "#f1874aff",
"(-0.012, -0.008]": "#f1b274ff",
"(-0.008, -0.004]": "#FEE08B",
"(-0.004, 0]": "#FFFFBF",
"0": "#a8a8a8ff",
"(0, 0.004]": "#E6F5D0",
"(0.004, 0.008]": "#bce091ff",
"(0.008, 0.012]": "#9ad65fff",
"(0.012, 0.025]": "#78b552ff",
"(0.025, 0.05]": "#5cb027ff",
"(0.05, inf]": "#1f6fc6ff",
}
(
ggplot(
(
labour_processed_cutted.filter(
pl.col("YEAR") >= FILTER_YEAR[0], pl.col("YEAR") <= FILTER_YEAR[1]
)
),
1 aes(x="DATE_YMD", y="centered_rank_across_industry", fill="PDIFF_BINNED"),
)
+ geom_tile(color="white")
+ theme_tufte()
+ theme(figure_size=FIGURE_THEME_SIZE, axis_text_x=element_text(angle=90))
2 + scale_fill_manual(values=color_mapping, breaks=order)
)- 1
-
map
filltoPDIFF_BINNED - 2
-
provide explicit color mapping to
scale_fill_manual
The power of scale_fill_manual is that it enables much more explicit control over how color is mapped to data. However, the cost was that it takes a lot more effort and lines of code, compared to scale_fill_gradient2, which works well “out-of-box”.
The legend
…is mathematically accurate, however we are going to make it nicer to look at.
First let’s make the text more concise: we don’t need every bin to be labelled, and instead of listing the range, we can just describe the midpoint.
legend_labels = [
"-5%", # the ends can be labelled with the boundary e.g. implies <-5%
"",
"",
"-1%",
"",
"",
"No change",
"",
"",
"",
"1%",
"",
"5%",
]
(
ggplot(
labour_processed_cutted.filter(
pl.col("YEAR") >= FILTER_YEAR[0], pl.col("YEAR") <= FILTER_YEAR[1]
),
aes(x="DATE_YMD", y="centered_rank_across_industry", fill="PDIFF_BINNED"),
)
+ geom_tile(color="white")
+ theme_tufte()
+ theme(
figure_size=FIGURE_THEME_SIZE,
axis_text_x=element_text(angle=90),
legend_justification_right=1,
legend_position="right",
legend_text_position="right",
legend_title=element_blank(),
legend_key_spacing=0,
legend_key_width=10,
legend_key_height=10,
legend_text=element_text(size=8),
)
1 + scale_fill_manual(values=color_mapping, breaks=order, labels=legend_labels)
)- 1
-
provide the list
legend_labelstoscale_fill_manual
I originally wanted to make a horizontal legend, but this works much better.
Text and fonts
Next up is the text and fonts. I played with a few fonts on google fonts before settling on two. Note that this website uses these fonts with the help of brand.yml
Install the fonts:
FONT_PRIMARY = "Playfair Display"
FONT_SECONDARY = "Lato"
import mpl_fontkit as fk
fk.install(FONT_PRIMARY)
fk.install(FONT_SECONDARY)Font name: `Playfair Display`
Font name: `Lato`
mizani for axis breaks and labels
plotnine breaks and labels for the scales can be easily adjusted using mizani, which is like the scales equivalent to ggplot2
We’re going to use mizani.breaks.breaks_date_width to put breaks for each year, and mizani.labels.label_date to drop the “month” part of the date.
import mizani.labels as ml
import mizani.breaks as mb
plot = (
ggplot(
labour_processed_cutted.filter(
pl.col("YEAR") >= FILTER_YEAR[0], pl.col("YEAR") <= FILTER_YEAR[1]
),
aes(x="DATE_YMD", y="centered_rank_across_industry", fill="PDIFF_BINNED"),
)
+ geom_tile(color="white", height=0.95)
+ theme_tufte()
+ theme(
1 text=element_text(family=FONT_PRIMARY),
figure_size=FIGURE_THEME_SIZE,
axis_text_y=element_text(family=FONT_SECONDARY),
axis_text_x=element_text(family=FONT_SECONDARY),
axis_title_y=element_text(weight=300),
legend_justification_right=1,
legend_position="right",
legend_text_position="right",
legend_title_position="top",
legend_key_spacing=0,
legend_key_width=15,
legend_key_height=15,
legend_text=element_text(size=8, family=FONT_SECONDARY),
legend_title=element_blank(),
plot_title=element_text(ha="left"),
plot_subtitle=element_text(
ha="left", margin={"b": 1, "units": "lines"}),
)
+ scale_fill_manual(values=color_mapping,
breaks=order, labels=legend_labels)
+ guides(fill=guide_legend(ncol=1, reverse=True))
+ scale_x_datetime(
2 labels=ml.label_date("%Y"),
expand=(0, 0),
breaks=mb.breaks_date_width("1 years"),
)
3 + labs(
title="Sector Shifts: Where Canada's Jobs Are Moving",
subtitle=textwrap.fill(
"Track the number of industries gaining or losing jobs each month. Boxes are shaded based on percentage change from previous month in each industry's employment levels.",
width=75,
),
x="",
y="< SECTORS FALLING SECTORS RISING >",
)
)
plot- 1
-
Apply font family changes to the primary font in
theme(...) - 2
-
Use
mizanito format labels to show only the year inscale_x_datetime - 3
-
Add
title,subtitleand wrap long lines with the help oftextwrap
Highlighting an Industry
For more industry-specific insights, I would like to see where each individual ranks in the graphic.
- 1
- Specify indsutry
- 2
- Subset data
- 3
-
Add the subsetted data to another
geom_pointlayer
Line plot of unemployment
Appendix
Things that didn’t work
This section is a non-exhaustive list of design elements I wasn’t able to solve with plotnine
https://ggplot2.tidyverse.org/reference/geom_tile.html#aesthetics
Horizontal legend with horizontal legend text
Initially I wanted a horizontal legend for the colors. But in order to remove the whitespace between keys, I discovered that the text needs to be smaller than the legend keys, otherwise they “push” the legend keys apart in uneven manner. I attempted to (unsuccesfully) address this by making the legend text small, eliminating as much text as possible (e.g. removing the “%” characters for -0.50 and 0.50), and lastly increasing the legend key size.
But it still didn’t really work out the way I hoped, so I stuck with a vertical legend instead.